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Abstract — The paper deals with the problem of reconstructing 
the topological structure of a network of dynamical systems. A 
distance function is defined in order to evaluate the "closeness" 
of two processes and a few useful mathematical properties are 
derived. Theoretical results to guarantee the correctness of the 
identification procedure for networked linear systems with tree 
topology are provided as well. Finally, the application of the 
techniques to the analysis of an actual complex network, i.e. to 
high frequency time series of the stock market, is illustrated. 



I. Introduction 

Under the influence of improved numerical tools, a 
significant interest for complex systems has been shown in 
many scientific fields. In particular, attention has been focused 
on networks, highlighting the emergence of complicated 
phenomena from the connection of simple models. To this 
regard, a relevant impulse has been provided by the advances 
in neural network theory, that has contributed to underline the 
importance of the connection topology in the realization of 
complex dynamics [l]. As a consequence, graph theory |2| 
has been successfully exploited to perform novel modeling 
approaches in several fields, such as Economics (see e.g. 13J, 
H, 0), Biology (see e.g. ||6l, Q) and Ecology (see e.g. 
m, 0, ifTOl ). especially when the investigated phenomena 
were characterized by spatial distribution and a multivariate 
analysis technique is preferred ifTTl . lfT2l . 
To the best knowledge of the authors, there are very few 
theoretical results about the reconstruction of an unknown 
topology from data. In this paper, we will focus our attention 
on tree topology networks. Though its reduced complexity 
with respect to cyclic link structures, the tree connection 
model turns out to be particularly suitable to represent a 
large variety of processes. In particular, the tree network 
scheme results effective in the description of systems with 
transportation, such as water and power supply, air and rail 
naffic, vascular systems of living organisms and channel and 
drainage networks (see e.g. |[l3j, [9J, L14J, L15J, L16J). It is 
worth to highlight that this kind of models is deeply related 
to the idea of delay, that characterizes the connections as 
transportation media. It is also important to recall that in 
Unear dynamical system theory the transfer function is a 



powerful representation tool for delayed processes fVl], ifTSl . 
In many situations, when the topology to be reconstructed is 
a tree, the only observable nodes are the leaves. Then, the 
usual theoretical framework is almost always set in standard 
graph theory as in the Unweighted Pair Group Method with 
Arithmetic mean (UPGMA) |19|. Its application is mainly in 
the reconstruction of evolutionary trees, but it has been widely 
employed also in many other areas: communication systems 
and resource allocations. Theoretically, such a technique 
guarantees an exact reconstruction of a tree topology only on 
the strong assumption that an ultrametric is defined among 
the considered nodes. An approach based on system theory 
and identification tools is completely missing. Specifically, 
there are no approaches considering explicitly the possibility 
of dynamics among the nodes. While dynamical networks 
have been deeply studied and analyzed in automatic control 
theory, the question of reconstructing an unknown network 
of dynamical systems has not been formally investigated. In 
fact, in most applicative scenarios the network is given or it 
is the very objective of design. However, there are also some 
interesting situations where the network links are actually 
unknown, such as in biological neural networks, biochemical 
metabolic pathways and financial markets. Even though an 
acyclical topology may seem a quite reductive choice, given 
an intricate and connected topology, we may be interested 
into "approximating" it with a tree. Such an approximation 
could be considered "satisfactory" if the most important 
connections were captured. 

In this manuscript we will develop a rigorous mathematical 
method to exactly identify the connections scheme of a 
tree topology network of noisy linear dynamical systems, 
providing a theoretical background for linear network 
modeling. In particular, in Section HI] we will introduce 
definitions and preliminary results which are useful to 
characterize the mathematical framework. In Section [III] 
our approach to topology reconstruction will be presented 
and sufficient conditions for an exact identification will be 
reported as well. In Section |IV] the theoretical results will 
be confirmed by practical implementations of the proposed 
technique, illustrated by means of numerical examples. In 
Section [V) we will show that the identification of a tree 



topology can provide useful information even for complex 
network. To this end, we will apply our technique to the 
analysis of high frequency real data originated by a portfolio 
of financial stocks. Some final conclusions in Section IVTl will 
end the manuscript. 

Notation: 

E[-]: mean operator; 

Rxy{t) = E[X{t)Y{t + r)]: cross-covariance function of 

stationary processes; 

Rx{t) ^ Rxx{t)'. autoco variance; 

PxY ,^^X^ ■ correlation index; 

Z{-): Zeta-transform of a signal; 

^xy{z) = Z(i?xy (''■)): cross-power spectral density; 

^x{z) = ^xx{z)'- power spectral density; 

with abuse of notation, ^x{^) — $x(e*'^); 

[•] and [ J : ceiling and floor function respectively; 

(•)*: complex conjugate. 

II. Problem set up 

In this section we formally introduce a model to address 
noisy linear dynamical systems interconnected to form a tree 
topology and we also provide a quantitative tool to characterize 
the mutual dependencies. 

Let us consider a network of n time-discrete SISO linear 
dynamical systems affected by additive noises. Then, let Hj [z) 
be the transfer function of the j-th system, {Xj{k)}k^z and 
{C/j(fc)}fcgz its output and input signals respectively and 
{Qj{k)}k£Z a zero-mean wide-sense stationary noise. Hence, 
each system can be represented according to the model: 



x,{k) ^ H,{z)u,{k) + e,{k) Vj = i, 



(1) 



We stress that no assumptions on the causality of Hj{z) have 
been done. Moreover, let the property 



(2) 



holds. Then, suppose that the input signal Ui of each node 
results the output of another process and that the systems of 
the network are connected to form a tree topology, preventing 
the presence of cycles. 

In this paper we will formally address this kind of network 
according to the following definition. 

Definition 1: Consider the ensemble of a rooted tree topol- 
ogy of n nodes Nj and a corresponding set of n linear time- 
discrete SISO systems affected by noise, described according 
to the model (HJ. Namely, assume Ni as the root node. 
Moreover, let {gj}j=i^...ji be zero-mean wide-sense stationary 
random processes satisfying Q, i.e. mutually not correlated 
zero-mean noises. Then, we define Linear Cascade Model Tree 
(LCMT) a dynamical network defined by the equation system 



Xi = Hi{z)X^,+gi 
Xn = Hn{z)X 



(3) 



where Hi{z) = and the set {tti, . . . , 7r„} is a permutation 
of {1, ...,n}. 



Definition 2: A LCMT is well-posed if {lu) > for all 
gj, and for all uj 

Assuming to have a complete statistical knowledge of each 
process {^i}i=i, we are interested in the identification of 
the links, which describe the tree characterizing the network 
topology. To this aim, hereafter we introduce some preliminary 
results, which can be exploited to define a mathematical tool 
for the quantitative characterization of the connections. 

Let us consider two stochastic processes Xi, Xj and let 
Wji{z) be a time-discrete SISO transfer function. Hence, 
consider the quadratic cost 



where 



E [isQY 



eQ^Q{z){X,~W,,{z)X,) 



(4) 



and Q{z) is an arbitrary stable and causally invertible time- 
discrete transfer function weighting the error 

eji = Xj - Wji{z)Xi . 

Then, the computation of the transfer function W{z) that 
minimizes the quadratic cost (|4|l is a well-known problem in 
scientific literature and its solution is referred to as the Wiener 
filter mi. 

Proposition 3 (Wiener filter): The Wiener filter modeling 
Xj by Xi is the linear stable filter Wji minimizing the filtered 
quantity (|4]i. Its expression is given by 



W,,{z) 



XiX. 



^xAz) 



(5) 



and it does not depend upon Q{z). Moreover, the minimized 
cost is equal to 

min E [s'q\ = 

= ^ r IQHP i^^xM - \^x,xM\^^x»)dco, 

and the corresponding error 

e^i = Xj - Wji{z)Xi 

is not correlated with Xi, i.e. 

E[e,,X,] = . (6) 

Proof: See, for example. El, IHl. ■ 
Since the weighting function Q{z) does not affect the 
Wiener filter, but only the energy of the filtered error, we can 
choose Q{z) equal to Fj{z), the inverse of the spectral factor 
of (z), that is 



^xM ^ F^Hz){F-\z)r 



(7) 



In particular, it is worth recalling that Fj{z) is stable and 
causally invertible Ii20j . Therefore, the minimum of cost (|4]l 
assumes the value 



--^[4J = ^ £ (i 



\<^x,x, (^)P 
^x, {uj)^xAio) 



doj . (8) 



Observe that, due to such choice of Q{z), the cost turns out 
to expHcitly depend on the coherence function of the two 
processes: 



(9) 



Let us recall that the coherence function is not negative and 
symmetric with respect to lo. Moreover, it is also well-known 
that the cross-spectral density satisfies the Schwartz inequality 
and, thus, the coherence function results limited between 
and 1. Therefore, according to the previous results, the cost 
([8]l turns out to be dimensionless and not depending on the 
"energy" of the stochastic processes Xi and Xj. 
The following result holds. 

Proposition 4: In a well-posed LCMT, the binary function 

-,1/2 



d{X,,Xj) 



1 

2^ 



(l-Cx.x,M)dc^ 



(10) 



is a metric. 

Proof: The only non trivial property to be proved is the 
triangle inequality. Let Wyiiz) be the Wiener filter between 
Xi , Xj computed according to Q and Cji the relative error. 
The following relations hold: 

-'^3 = W-ix{z)Xx + 631 
^3 = Wz2{z)X2 + 632 
X2 = W^2l(^)^l +621. 

Since Wj,\{z) is the Wiener filter between the two processes 
X\ and X3, it performs better at any frequency than any other 
linear filter, such as Vp32(2:)W^2i(-z)- So we have 

3>e3iM < $e3.M + |W^32M|'$e.iM + 

+ ^632621 (^)W'3*2('^) + W^32(t^)$e2ie32(^) < 

< (x/^ea^H + |VK32M|V$e2iM)' V Cc- £ M. 

For the sake of simplicity we neglect to explicitly write 
the argument lj in the following passages. Normalizing with 
respect to '^Xi, we find 



1 



<1— (V^+|W^32|y^f 



^X^ ^X^ 

and considering the 2-norm properties 



< 



X3 



< 



^3 



1$ 



-dio 



where we have substituted the expression of W32. Finally, 
observing that 

|2 



< 



< 1, 



we find 



rf(^l,^3) <d(^l,^2)+d(^2,^3)■ 



III. Main result 

In this section we exploit the coherence-based distance 
([Tol l to derive sufficient conditions to guarantee the exact 
reconstruction of the topology of a dynamical network. To this 
end, we first need to introduce a few definitions and technical 
lemmas. 

Definition 5: We define "path" from Ni to Nj a finite 
sequence of Z > nodes A^tti, ^-ni such that 
. N^^ = Ni 

• N.^. and A^TTi+i are linked by an arc of the tree for i = 
. N^, ^ N^^ for i ^ J. 

In the following we consider LCMT networked systems. It 
is worth underlining that a rooted tree is a pair made of a tree 
and one of its nodes Nr, named as "root". Hence, since a tree 
is a connected graph, in a LCMT network there is always a 
path between two nodes and, since there are no cycles, such 
a path is also unique. 

The presence of a special node labeled as "root" induces a 
natural relation of "order" among the nodes in the following 
way 

Definition 6: Given a rooted tree, consider the path from 
Nr to another node Nj. A node Ni is said to be an ancestor 
of Nj if Ni / Nj and if it belongs to the path from A',, to 
Nj. Alternatively, we say that A^ is a descendant of A^^. We 
also say that Ni is parent of Nj (or that A^ is a child of Ni) 
if, in addition, A^ and Ni are connected by an arc. 

It is straightforward to prove that the root is an ancestor to 
all the other nodes and that every node but the root has exactly 
one parent. Hereafter an important result about the correlation 
property in a LCMT is introduced. 

Lemma 7: Given a LCMT T, consider a node A^ and a 
node Ni ^ Nj which is not a descendant of Nj. Then it holds 
that E[g.jX,] = 0. 

Proof: Let A^^ be the root of T and A^^^ , A^jr, the path 
from Nr to A'^. Exploiting the linear dependencies among the 
signals of the LCMT, Xi can be expressed in terms of the 
noises g^n,-; Q-m 



where 



9=1 



h—q 



(11) 



(12) 



Since Ni is not a descendant of Nj and Ni ^ Nj, we have 
that gjr, 7^ Qj for q — 1, thus 



E[g,X,] = E 



I 







(13) 



The two following lemmas provide two important inequal- 
ities about the coherence functions related to the network 
signals. 



Lemma 8: Consider a LCMT T and three nodes Ni, Nj 
and Nk such that 

• Nk is a descendant of Nj 

* Ni is not a descendant of Nj and Ni ^ Nj. 

Then we have that CxiX, > CxiXk- Moreover, if T is well- 
posed then the inequality is strict. 

Proof: Consider the path from Nj to Nk described by the 
sequence Nt^ N-^,- Exploiting the linear relations ([T]), the 
process Xk can be expressed in terms of Xj and of the noises 
acting on the nodes NTr2, N.^, which are all descendants of 



Xk — WkTTiXj 



q=2 



(14) 



where Wiir is defined as in (fT2l l. Now, we intend to evaluate 
the coherence between Xi and Xj. From the assumption on 
Ni, it follows that Ni is not on the path from Nj to Nk. In 
other words, Ni is not a descendant of A^^ and Ni ^ N.^ 
for q = 1, Z. We can write 



XiXk\ 



\Wk-.A^\^x^x, 



(15) 



|W^fc.J2 + EU 1^'=-. l^'J'e^J 

where the last equality holds because of Lemma |7]. Collecting 
the factor |W^fe7ri we obtain 



c 



1 



< C 



(16) 



where the inequality is strict if X]q=9 l^feTr, P'l'ex, > 0- * 
Lemma 9: Consider a LCMT T and three different nodes 

Ni, Nj and TVfe such that 
• Nk is a child of iVj 

» Ni ^ Nj , Nk and it is not a descendant of Nk 
Then Cx Xk ^ CxiX^- Moreover, if T is well-posed the 
inequality is strict. 

Proof: Assume that Xk = HkjXj + Qk and let us 
distinguish two possible scenarios, 
case A 

First, consider the case where Nj is a descendant of Ni. 
Consider the path from Ni to Nj described by the sequence 
of / nodes N^, 



, Nt^i where N^^i 



N, and 7V^, = A^,-. The 



process Xj can be expressed in terms of Xi and of the noises 
acting on the nodes A^^r, , ■ • ^tt, which are all descendants of 

I 

q=2 

Exploiting Lemma |7] we can evaluate the following quantities 



C 



XiXk 



^x. 



(18) 



and 



1$ 



XjXi, 



kj\ 



\^x. 



\H, 



kj 



'^xAw,, 



E 

q=2 



(19) 



By inspection we have the assertion. 

Now we are left to consider the case where Nj is not a 
descendant of Ni. Then, also Nk is not a descendant of TV^. By 
hypothesis, Ni is not a descendant of Nk, either Thus, they 
must have a common ancestor Nd, such that the two paths 
from Nd to A^^ and from A^ to Ni have only Nd in common. 
Consider the path from Nd to Ni, such that it is possible to 
write 



X 



W„,Xd 



q=2 



Exploiting lemma |7] we have 

1$,. - 12 



c 



- X-i Xu 



XiXu 



'^X^'^Xk 



^Xk 
< '^XkX. 



< 



(20) 

(21) 

(22) 
(23) 



If Nd = Nj, we have the assertion. If Nd ^ Nj, then A'^ must 
be a descendant of A^^. We are in a situation equivalent to case 
A: there is a node Nd such that Nj is one of its descendants. 
As a consequence, we can state that 



C 



XkXd 



< C 



(24) 



Combining the last two inequalities, we conclude that the 
lemma holds also in this case. ■ 

All the previous lemmas are functional to the show that the 
coherence distance (fTOl l is minimal between two contiguous 
nodes, as summarized in this theorem. 

Theorem 10: Given a LCMT T, consider a node Na and a 
node Nb ^ Na which is not directly linked to it. Then there 
exists a node Nc directly linked to Na such that 



d{Na,N,)<d{Na,Nb) 



(25) 



where the inequality is strict if T is well-posed. 

Proof: First, consider the case where Nf, is a descendant 
of Na. Name Nc the child of Na on the path linking it to 
A^fc. Since Nc is directly linked to Na, we have Ah ^ Nc. 
Moreover A";, is a descendant of Nc. We are allowed to apply 
lemma dSll with N, ^ Na, Nj = Nc and A^^ = A^^ to have 
the assertion. 

Now, consider the case where Ni, is not a descendant of Na. 
Na can not be the root, otherwise A'f, would be one of its 
descendants. Thus A'a has a parent and let us name it Nc. 
Njj can not be Nc because it is not directly linked to Na. 
Applying lemma © with Ni = Nb, Nj = Nc and Nk = Na 
and by the definition of the coherence distance (fTOl l. we have 
the assertion. 



Theorem[TO]can be fruitfully exploited to determine whether 
two processes in a well-posed LCMT are directly linked. 
Nonetheless, when we are dealing with data sampled from 
actual systems the computation of d, that is of the coherence 
function, is affected by the limited time horizon of the ob- 
servations. However, the estimates of the spectral and cross- 
spectral densities converge to the actual values as the time 
horizon approaches infinity. Hence, in the following we will 
assume to sample the processes over a sufficiently large time 
interval. 

We are ready to show the main contribution of the paper 
Theorem 11: Consider a well-posed LCMT T and assume 
to observe the signals Xj during a time horizon t. Compute 
an estimate of the coherence based distances dij = d{Xi, Xj) 
among the nodes Nj and evaluate the relative Minimum 
Spanning Tree (MST). When t approaches infinity, the cor- 
responding topology is equivalent to the unique MST T 
associated to the coherence metric. 

Proof: The proof consists in showing that the MST T 
associated to the distance (fTOl i is unique and corresponds to 
the LCMT topology. We will prove this result by induction on 
the number n of nodes of the LCMT. 

The basic induction step consists in observing that theorem is 
true for n = 2. 

Now assume the theorem true for a LCMT with n nodes. Given 
a LCMT T with n + 1 nodes, remove one of its "leaves". 
By leaf we mean a non-root node with no descendants. This 
operation is always possible since any rooted tree with at least 
two nodes has at least one leaf. Without loss of generality, let 
the removed leaf be A^„+i and let iV^ be its parent. Now we 
have a LCMT T' with n nodes and with the same topology of 
T apart from the removed arc (i,n+ 1). Using the induction 
hypothesis, we know that the topology of T' is given by the 
unique MST T' obtained considering the distances among the 
nodes A^i, A^„. Now compute 

i* = arg min d{Xi, Xn+i). (26) 

j<N+l 

The solution of such a minimization problem is unique since 
the LCMT T is well posed. Because of lemma [TOl the arc 
{i*,N + 1) belongs to the topology of T, so we conclude 
i* ^ i. Let T be the spanning tree obtained by adding the arc 
{i,N + 1) to T'. So far, we have shown that T represents the 
topology of T. We have to prove that T is the unique MST 
related to the distance ( fTOl i among the nodes Ni, Nn+i- 
Suppose, by contradiction, that there is a minimum spanning 
tree T ^ T with weight lesser or equal than the weight of T. 
The only arc of T incident to the node iV„+i is (i,n + 1). 
If there were another arc {k,n + 1) in T we could replace it 
with the arc (fc, i) obtaining a spanning tree with inferior cost. 
Indeed, by lemma |9] we would have 

diXk,X,) <diXn+i,X,). (27) 

So, if T is a minimum spanning tree, then Xn+i can be 
connected only to Xi. Let T' be the tree obtained by T 
removing the arc {i,n + 1). T' is the minimum spanning 
tree for the nodes A^i , . . . , Nn since it has been obtained from 
T removing the node Nn+i which has a single connection. 



However, by the induction hypothesis, there is a unique MST 
T' among the nodes A^i, N,,. Thus we have that T" = T'. 
It immediately follows the contradiction that T — T. ■ 
So far, we have assumed that the dynamics of the network 
is described by a rooted tree. Moreover, the previous theorem 
proves that the topology structure can be correctly identified 
evaluating the MST according to the distance ( fTot . However, 
no information is recovered about the root node. The following 
result shows that such an information is not necessary (or, 
equivalently, not recoverable). Indeed, from a modeling point 
of view, the choice of the root can be arbitrary (as long as 
we are considering non-causal transfer functions linking the 
processes Xj). 

Theorem 12: Given a LCMT T whose root is the node Nj 
and given one of its children Ni, it is possible to define another 
LCMT T* with the same tree structure and described by the 
same processes Xk, k = 1, ...,n, such that its root is Ni. 

Proof: Consider the Wiener Filter Wji modeling the 
signal Xj, seen as the output, when Xi is the input 

X, =Wj,X, + ej,. (28) 

Now, consider a rooted tree with the same topology of T but 
with Ni as the root. Define = Hk and g^. = for all 
k ^ i.j. Conversely, define 

H* = W,, g* = eji (29) 

H* =0 g*=X,. (30) 

To show that the new dynamical network with Ni as root and 
described by the filters is an LCMT, we need to prove 
that, for h ^ k, 

E[glgl] = 0. (31) 

There are three possible scenarios. 

If h = i and k = j or h = j and k = i, then 

E[glgl] = 0. (32) 

because of the Wiener Filter properties. 

If h — i,j and k ^ i,j (or equivalently h ^ i,j and k — i,j), 

then lemma [T] can be applied. 

If h ^ i,j and k ^ then 

E[g*hgl]= E[ghgk] (33) 

and we have the assertion because g^ and gk are two noise 
signals of the original LCMT T. ■ 
It is straightforward to show that, starting from an LCMT T, 
we can arbitrary define a LCMT T* having an arbitrary node 
as root. Indeed, it is sufficient to iteratively apply Theorem [T2l 
along the path starting from the original root to the new one. 

IV. Numerical examples 

In this section we introduce a suitable framework to il- 
lustrate the application of the previous theoretical results to 
numerical analysis. It is worth observing that the previous 
results have been developed for the most general class of Unear 
models. Indeed, no assumptions have been done on the order 
and causality property of the considered transfer functions. 
Moreover, let us highlight that the coherence based analysis 



Fig. 1. The figure illustrates the topology of the 10 nodes network analyzed in 
the numerical examples paragraph. Each node is responsible for a process Xj , 
while the arcs describe the connections among them, according to the linear 
SISO model (T). For the data generation we have considered only transfer 
functions of at most the second order. The noises have been assumed 
to provide half the power of the affected processes. The samples have been 
collected over 1000 time steps. 



must be realized "off-line", since the processes have to be 
evaluated over their entire time span. Thus, because the 
coherence function can be numerically computed only over 
limited intervals, in the following examples we will consider 
sufficiently long time spans to reduce the numerical error. 

Hence, let us build the original dynamical networks accord- 
ing the following rules: 

• each system is described according to the model ([T]i; 

• each transfer function Hj is randomly generated and such 
that it is causal and at most of the second order; 

• the tree topology is randomly chosen; 

• the noises gj are numerically generated with a pseudo- 
random algorithm; 

• the noise-to-signal ratio of each system is equal to one. 

Then, such networks are simulated over 1000 time steps and 
the related data Xj are collected. The corresponding coherence 
based distances are evaluated and used for the extraction of 
the MST, that defines the link topology. 

The above procedure will be first applied to a ten node 
network. In particular, to test the numerical reliability of the 
topological identification technique, we repeat such analysis 
several times, so that a significant number of network con- 
figurations is considered. The corresponding results fit the 
expectations and the real topology is correctly identified each 
time. In Fig. [T] one of the considered network configurations 
is depicted, while the related coherence based distance matrix 
is reported in Table |T] 

To provide a further test, a new set of similar simulations is 
performed with a network of fifty dynamical systems, under 
the same assumptions used in the previous case. Figure |2] 
presents one of the considered network configurations. For a 
space limitation issue, we do not report in this manuscript the 
corresponding coherence based distance matrix. Nonetheless, 
the computation of the related MST has successfully identified 
the real network topology in any of the performed simulations. 



Fig. 2. A representative topological configuration of the 50 nodes network 
case considered of the numerical examples paragraph. The example has been 
designed according to the same assumptions of the ten node network of Figure 

V. Stock market analysis 

In the previous section we have illustrated how the distance 
([Tol l can be successfully exploited to derive the exact topology 
of a tree network of linear systems affected by additive noises. 
Nonetheless, since the above identification technique is able to 
catch the most important linear dependencies with respect to 
the modeling error (|4]i, in the following we present the results 
obtained by the application of the previous method to the stock 
market, that is a network of nonlinear systems characterized 
by multiple dependencies. 

Financial systems are, in general, very complex and deriving 
information from stock markets is a formidable and challeng- 
ing task, indeed. Moreover, it might seem very reductive the 
attempt to describe the dependencies among the price trends in 
terms of linear SISO systems with a tree topology. In fact, we 
should definitely expect multiple input influences, nonlinear 
relations and feedbacks. However, we can think of adopting 
a LCMT in order to detect what are the "strongest" links in 
the network. As noted in [3], such an information could be 
usefully exploited to check if a given portfolio is balanced or 
not. In the following, we report the results obtained by the 
application of our identification technique. 
A collection of 100 stocks of the New York Stock Exchange 
has been observed for four weeks (twenty market days), in 
the lapse 03/03/2008 - 03/28/2008 sampling their prices every 
2 minutes. The stocks have been chosen on the first 100 
stocks with highest trading volume according to the Standard 
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TABLE I 

THE COHERENCE BASED DISTANCE MATRIX ASSOCIATED TO THE NETWORK TOPOLOGY DEPICTED IN FlG.[T] 




Fig. 3. (color on line) The tree structure obtained using the proposed identification technique. Every node represents a stock and the color represents the 
business sector it belongs to. The considered sectors are Basic Material (yellow). Conglomerates (white). Healthcare (pink). Transportations 
(dark blue). Technology (red). Capital Goods (orange). Utilities (brown tints). Consumer (violet tints). Financial (green tints), Energy 
(gray tints) Services (light blue tints). Using the industry classification given by Google, the Financial sector has also been dilferentiated among 
Insurance Companies (light green). Banks (average green) and Investment Companies (dark green); Services have been divided in Information Technology 
(cyan) and Retail (aquamarina). Consumer in Food (plum) and Personal-care (purple); Energy in Oil & Gas (dark gray) and Well Equipment (light gray); 
Utilities in Electrical (dark brown) and Natural Gas (light brown). 



& Poor Index at the first day of observation and they are 
reported in Table HIl An a-priori organization of the companies 
has been assumed in accordance with the sector and industry 
group classification provided by Google Finance®, that is also 
the source of our data. The whole observation horizon spans 
almost the whole month of March. Hence, the corresponding 
price series can not be considered stationary and the statistical 
tools can not be successfully employed to analyze the raw data. 
In literature a variety of techniques for the suppression of 
trends and periodic components in non-stationary time series 
exists. However, we want to stress that the application of 
such procedures introduces an additional prefiltering phase, 
which is responsible for the computational burden increase. 
Moreover, due to the pre- and post-market sessions, there 
is a discontinuity between the end value of a day and the 
opening price of the next one. We have avoided those problems 
observing that the observation horizon is naturally divided 
into subperiods, namely weeks and days. In addition, a single 
market session can be considered a time period sufficiently 
short to assume that the influence of trends and seasonal 
factors are negligible. Thus, in our analysis, we have followed 
the natural approach of dividing the historical series into 
twenty subperiods corresponding to single days. Then, we 
considered the sessions separately, i.e. we have computed the 
coherence-based distances ( fTOl i among the stocks for every 



single day. Finally, we have averaged such daily distances 
over the whole observation horizon and the related results have 
been exploited to extract the MST, providing the corresponding 
market structure. 

We find useful to remark that the computation of the distances 
for smaller data sets is also better performing and that the 
averaging procedure provides the desired rejection of trends 
and seasonal components. Notably, a similar idea, even if more 
sophisticated, is at the basis of the method developed in ||2TI 
to detrend non-stationary time series. 

The final topology is shown in Figure[3] Every node represents 
a stock and the color represents the business sector or industry 
it belongs to. We note that the stocks are very satisfactorily 
grouped according to their business sectors. We stress that the 
a-priori classification in sectors is not a hard fact by itself 
and we are not trying to match it exactly. A company could 
well be categorized in a sector because of its business, but, 
at the same time, could show a behaviour similar to and 
explainable through the dynamics of other sectors. Actually, 
we would be very interested into finding results of this kind. 
Indeed, in those very cases, our quantitative analysis would 
provide the greatest contributions detecting in an objective 
way something which is "counter-intuitive". Thus, we just 
use such a-priori classification as a tool to check if the final 
topology makes sense and if, at a general level, our approach 



provides useful results. Despite this disclaim, it is worth 
noting that the Financial (green tints). Consumer (violet 
tints), Basic Materials (yellow). Energy (gray tints) 
and Transportation (dark blue) sectors are all perfectly 
grouped, with no exceptions. In Fig. [3] we note a subcluster- 
ization of the Financial sector, as well. The Consumer 
sector shows another prominent subclusterization in the Food 
(plum) and Personal/Healthcare (purple) industries, 
while the Energy sector presents an evident subclusteri- 
zation into the Oil & Gas (dark gray) and Oil Well 
Equipment (light gray). The Utilities/Electricity 
companies (dark brown) are, interestingly, a different group. 
We also observe a big cluster of companies classified 
as Services (light blue tints). We have differentiated 
them in the two industries Retail and Information 
Technology using two slightly different colors, respectively 
aquamarine and cyan. We also note the presence of three 
Services companies which are isolated from the other ones: 
V [Verizon], T [AT&T], and S [Sprint]. All of them are 
telephone companies. This might suggest that this industry 
should show at least a slightly different dynamics from the 
other service companies. Note also how the Technology 
sector (red) is almost perfectly grouped and how IBM, an IT 
company, even though classified as a Services company, is 
located in it. Finally, the two only automobile companies GM 
and F [Ford] happen to be linked together The analysis of this 
four weeks of the month of March cleanly shows a taxonomic 
arrangement of the stocks even though the choice of a tree 
structure might have seemed quite reductive at first thought. 

VI. Conclusions 

This work has illustrated a simple but effective procedure 
to identify the structure of a network of linear dynamical 
systems when the topology is described by a tree. To the 
best knowledge of the authors, the problem of identifying a 
network has not yet been tackled in scientific literature. The 
approach followed in this paper is based on the definition of 
a distance function in order to evaluate if there exists a direct 
link between two nodes. A few theoretical results are provided, 
in particular to guarantee the correctness of the identification 
procedure. An application of the technique to real data has 
also shown that a tree topology can be sufficient to capture 
information even in complex situations such as financial stock 
prices. 
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Name | Code | Sector 



3M Company 


MMM 


Conglomerates 


Abb oil LAbuRLloiics 


ABT 


Healthcare 


Acs Coiporalion 


AES 


Utilities 


Alcoa Inc. 


AA 


Basic Mateiials 


Allegheny Technologies Inc. 


ATI 


Basic \kitei-ials 


Allslalc Corporation 


ALL 


Financial 


Altria Group 


MO 


Consu met /Non-Cyclical 


American Electric Power 


AEP 


Utilities 


American Express 


AXP 


Financial 


American InteroatloQal Group 


AIG 


Financial 


Amgen Inc. 


AMGN 


Healthcare 


AnheusM Busch 


BUD 


Consumer/Non-Cyclical 


Apple Inc. 


AAPL 


Technology 


AT&T 


T 


Services 


Avon Products 


AVP 


Consumer/Non-CycIical 


Baker Hughes Inc. 


Bill 


Enei^y 


Bank of America 


BAC 


Financial 


Bank of New York Mellon 


BK 


Financial 


B^ixlcr Inlernalional 


BAX 


Healthcare 


Boeing 


BA 


Capital Goods 


Bristol Myers Squibb 


BMY 


Healthcare 


Burlington Northern Sanla Fe 


BNl 


Tran spot tat ion 


Campbell Soup 


CPB 


C n s u mer/No n - C y c 1 i cal 


Capital One Financial 


COF 


Financial 


Caterpillar Inc. 


CAT 


Capital Goods 


CBS 


CBS 


Serv iccs 


( hovioii 


( \ .\ 


l:i]ci,-> 


CK,i\A 




l-iiiaiicidl 


Cisco Systems 


CSCO 


Technology 


Citigroup Inc 


*- 


Fitiancial 


Clear Channel Communications 


ecu 


Services 


Coca-Cola 


KG 


Consumer/Non-CycIical 


Colgate Palmolive 


CL 


Consumer/Non-CycIical 


Comcast 


CMCSA 


Services 


Conoco Phillips 


COP 


Enei^y 


Covidien 


COV 


Healthcare 


CVS Caremark 


CVS 


Services 


Dell Inc 


DELL 


Technology 


Dow Chemical Company 


DOW 


Basic Materials 


E.I. du Pont de Nemours 


DD 


Basic Materials 


El Paso 


EP 


Utilities 


EMC 


EMC 


Technology 


Enteiyy 


ETR 


Ulililies 


l.\.-l,n, 


L.\l 


I lllllK■^ 


Lxxon iMobil 


XOM 


Liieig\ 


FedEx 


FDX 


Transportation 


Ford Motor 




Consumer Cyclical 


General Dynamics 


GD 


Capital Goods 


General Electric 


GE 


Conglomerates 


General Motors 


GM 


ConsLimer Cyclical 


Goldman Sachs Group 


GS 


Financial 


Google Inc. 


GOGG 


Ted 1110 logy 


Halliburton 


HAL 


Liiei-gv 


Hartford Financial Services 


I-IIG 


Financial 


H. J. Ileinz 


IINZ 


C onsumer/Non -Cyclical 


Ilewlell-Packaid 


IIPQ 


Technology 


Home Depot 


I ID 


Services 


Honeywell International 


HON 


Capital Goods 


Intel 


INTC 


Technology 


International Business Machines 


IBM 


Services 


International Paper 


IP 


Basic Materials 


Johnson & Johnson 


JNJ 


Healthcare 


JPMorgan Chase 


JPM 


Financial 


Kraft Foods 


KFT 


Consumer/Non-CycIical 


Lehman Brothers Holding 


LEI I 


Financial 


McDonald's 


MCD 


Services 


Medtronic 


MDT 


lleallhcare 


Merck 


MRK 


1 lealthcaie 


Merril Lynch 


MER 


Financial 


Microsoft 


MSFT 


Technology 


Morgan Stanley 


MS 


Financial 


Norfolk Souther Group 


NSC 


Transportation 


NYSE Euronext 


NYX 


Financial 


Oracle 


ORCL 


Teel 1110 logy 




I'l.l' 


( >'iiM iii.-i No:i-( \.'lK'al 


Fli/.er Inc. 


Fl L 


lleallhcji-c 


Procter & Gamble 


PC 


Consumer/Non-Cy elieal 


Raytheon 


RTN 


Conglomerates 


Regions Financial 


RF 


Financial 


Rockwell Automation 


ROK 


Technology 


Sara Lee 


SLE 


Consumer/Non-Cyclical 


Schlumberger Limited 


SLB 


Energy 


Southern 


SO 


Utilities 


Sprint Nextel 




Services 


Target 


TGT 


Services 


Texas Instruments Inc. 


TXN 


Technology 


Time Warner 


TWX 


Services 


lyco International 


TYC 


Conglomerates 


U. S. Bancorp 


USB 


Financial 


United Parcel Service 


UPS 


Transportation 


United Technologies 


I TX 


Conglomerales 


I 1111,'L.lllcaill; till. lip l:ic. 


I Mi 


1 Kiaii.'ial 


Veii/on CoiuniunicLilions 




Semccs 


Wachovia 


V\B 


Financial 


Wal-Mart Stores 


WMT 


Services 


Walt Disney 


DIS 


Services 


Wells Fargo 


^ 


Financial 


Weyerhiieiiser Ct)mpanv 




Basic Materials 


Williams Companies 


WMB 


Utilities 


Xerox 


XRX 


Technology 



TABLE II 

List of the companies considered in the analysis 



